Role- AI Architect
Location- Brussels, Belgium
Experience level- 15+ Years
Job Description:
Architecture & Solution Design
- Define reference architectures for GenAI systems: RAG, agentic orchestration, tool/function calling, multi-step reasoning workflows, memory patterns, and context strategies.
- Design multi-tenant and enterprise-scale GenAI platforms with clear separation of concerns: UI, orchestration, retrieval, inference, evaluation, and observability.
- Select model strategies: hosted LLMs, open-weight models, fine-tuning vs. prompt/RAG, latency and cost tradeoffs, and deployment patterns.
2) Agentic AI Orchestration & Tooling
- Architect agent systems (single/multi-agent) including:
- Task decomposition, planners/executors, reflection/verification loops
- Tool use patterns (APIs, databases, search, workflow engines)
- Guardrails to prevent unsafe tool actions and hallucinated commands
- Build reliable flows for "human-in-the-loop" decision points and approvals (e.g., procurement, customer comms, incident triage).
3) Retrieval, Knowledge Systems & Data Design
- Lead design of knowledge ingestion pipelines:
- document parsing, chunking strategies, embeddings, metadata, lineage, freshness SLAs
- Architect vector search and hybrid retrieval:
- semantic + keyword, reranking, filtering, ACL-aware retrieval
- Ensure retrieval respects access control, PII handling, data residency, and auditability.
4) Production Engineering, Reliability & Cost
- Set non-functional requirements for GenAI workloads:
- SLOs, latency budgets, fallback models, caching, rate limiting
- Design cost controls: prompt/token optimization, model routing, batching, and usage governance.
- Implement resiliency patterns: circuit breakers, retries, queue-based orchestration, idempotency.
5) Security, Risk & Responsible AI
- Establish AI security posture:
- prompt injection defenses, data exfiltration controls, tool sandboxing
- Define policies and controls for:
- sensitive data, logging, redaction, encryption, secret management, and auditing
- Collaborate with risk/compliance to drive:
- model governance, content safety, bias/quality monitoring, and regulatory alignment
6) Evaluation, Observability & Continuous Improvement
- Create evaluation frameworks:
- offline evals (golden sets), automated regression, and scenario-based testing
- Instrument systems for observability:
- traces, prompt/versioning, retrieval diagnostics, tool-call logs, and outcome metrics
- Run A/B tests and iterate on prompts, retrieval, and agent policies based on measurable outcomes.
7) Leadership & Stakeholder Management
- Partner with product leaders to identify high-value use cases and define roadmap.
- Mentor engineers and data scientists on best practices for LLM apps.
- Produce architecture artifacts: ADRs, threat models, system diagrams, runbooks.
Required Skills & Experience
Core Technical Skills (Must Have)
- 8+ years in software/solution architecture with 2+ years delivering GenAI/LLM solutions in production (adjust as needed).
- Strong knowledge of LLMs: prompting patterns, context windows, tool/function calling, model limitations, and safety risks.
- Agentic AI design experience:
- orchestrators, workflows, multi-step reasoning, tool usage, HITL patterns
- RAG expertise:
- embeddings, vector DBs, hybrid retrieval, reranking, chunking strategies, evaluation
- Cloud architecture (Azure/AWS/GCP) with production engineering rigor:
- microservices, containers (Docker/K8s), serverless, CI/CD
- Solid programming skills (one or more):
- Python, TypeScript/JavaScript, Java, C#
- Experience with APIs and integration patterns:
- REST/gRPC, event-driven systems, queues, workflow engines
Security & Governance (Must Have)
- Understanding of GenAI-specific threats:
- prompt injection, data leakage, jailbreaks, insecure tool calling
- Familiarity with enterprise controls:
- IAM, key management, encryption, network isolation, audit logging
- Responsible AI practices:
- evaluation, content moderation, privacy, and compliance-by-design
Architecture & Systems Skills (Must Have)
- Distributed system design:
- scalability, fault tolerance, caching, performance tuning
- Observability:
- logging/metrics/tracing, prompt/version tracking, monitoring SLIs/SLOs
- Cost management and performance optimization:
- model selection/routing, token reduction, caching, batching
Preferred / Nice-to-Have Skills
- Fine-tuning approaches:
- LoRA/QLoRA, instruction tuning, adapters, distillation (when appropriate)
- Experience with:
- Knowledge graphs, semantic layers, enterprise search
- Advanced evaluation:
- LLM-as-judge with safeguards, rubric scoring, adversarial testing
- MLOps/LLMOps toolchains:
- experiment tracking, feature stores, model registries, data quality tools
- Domain experience:
- customer support automation, developer productivity copilots, IT ops agents, finance or healthcare compliance
- Experience building platforms:
- reusable agent frameworks, reusable RAG components, multi-team enablement
For more information on how we process your personal data, please refer to HCLTech's Candidate Data Privacy Notice.
Solliciteren